Reconsidering the significance of genomic word frequencies.

نویسندگان

  • Miklós Csurös
  • Laurent Noé
  • Gregory Kucherov
چکیده

By conventional wisdom, a feature that occurs too often or too rarely in a genome can indicate a functional element. To infer functionality from frequency, it is crucial to precisely characterize occurrences in randomly evolving DNA. We find that the frequency of oligonucleotides in a genomic sequence follows primarily a Pareto-lognormal distribution, which encapsulates lognormal and power-law features found across all known genomes. Such a distribution could be the result of completely random evolution by a copying process. Our characterization of the entire frequency distribution of genomic words opens a way to a more accurate reasoning about their over- and underrepresentation in genomic sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Title: Reconsidering the Significance of Genomic Word Frequencies 1 2 Short Title: Genomic Word Frequencies 3 4 Introduction

NOTICE: this is the authors' version of a work that was accepted for publication in Trends in Genetics. Changes resulting from the publishing process such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Abstract 1 By conventiona...

متن کامل

The Significance of Education and Gender in Persian Word-selection

This study strives to investigate the importance of ‘education’ and ‘gender’, as two major sociolinguistic variables, in accepting or rejecting the words coined by the Iranian Academy of Persian Language and Literature (APLL). A total of 500 students from state universities in Tehran were chosen as subjects and provided with a questionnaire consisting of 50 APLL equivalents. The respondents’ ac...

متن کامل

Long non-coding RNAs and their significance in human diseases

Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...

متن کامل

Association of PIT1 gene and milk protein percentage in Holstein cattle

The pituitary-specific transcription factor (PIT-1) gene is a candidate gene for growth, carcass and also for milk yield traits. In dairy farm animals, the main goal of the selection is the improvement of milk yield and composition. The genes of milk proteins and hormones are excellent candidate genes for linkage analysis with quantitative trait loci (QTL) because of their biological significan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Trends in genetics : TIG

دوره 23 11  شماره 

صفحات  -

تاریخ انتشار 2007